Lei
Shi, IBM Research -
Weihong
Qian, IBM Research -
Furu Wei, IBM Research –
We use VisWorks TIARA for the text visual analytics
part and VisWorks VIGOR for the network analytics one. VisWorks VIGOR is based
on VisWorks Peony visualization framework. All these tools are developed by the
members of smart visual analytics team, IBM Research
Video:
Our
Video (in flash format, need to drag the file into browser for display)
ANSWERS:
MC1.1: Summarize the
activities that happened in each country with respect to illegal arms deals
based on a synthesis of the information from the different report types and
sources. State the situation in each
country at the end of the period (i.e. the end of the information you have been
given) with respect to illegal arms deals being pursued. Present a hypothesis about the next
activities you expect to take place, with respect to the people, groups, and
countries.
A. Summarization by Country:
The summary is attached at the end of the two MCs,
since it combines the results of both text and network visual analytics.
B. Hypothesis of the Next Activity:
All clues show that
there is a group of arm dealers in
C. Analytics Processes:
The process includes
three steps: pre-process the data, compose visualization, and finally conduct
visual analytics for the tasks.
C.1 Data
Pre-Processing:
We start by segment the raw text data provided
into snippets according to the title line of each smaller report/message/post.
These titles are well-formatted, so that both snippet boundary and the exact
time of each snippet could be extracted by regular expression matching.
We then extract named entitles, including
people/location/organization, with the Stanford NE Parser (http://nlp.stanford.edu/software/CRF-NER.shtml). We
further proceed by mapping the location into countries and also use POS parser
to extract verbs from the snippets for activity summarization.
As we find there are quite a lot of
countries in the data, we conduct a country clustering over the country
co-occurrence graph. As in Fig 1(a), the node size indicates the occurrence
count by snippets, edges indicates co-occurrence within the same snippets. To
simplify the graph, we filter out all the nodes with count less than 4 and
edges with count less than 2, and then remove the three bridge nodes (which
should be the intermediate countries, but not the ones requiring arms). Then 8
groups are found, we name them the topics, with the central country as each
topic name. Each snippet is then mapped to one topic. There are some manual
efforts in further checking the topic (country) classification, say 30 min for
one people. Also, we manually check the named entities (especially the people)
and create a name map to translate typos or short names into standard name. The
efforts are approximately 1 hour for one people.
(a) (b)
Figure 1. Country Graphs: (a) Original Country Co-Occurrence Graph; (b)
After node/edge count filtering and deleting the bridge node of “
C.2
Visualization:
After the first step, we get a time-evolving
document (snippet) collection with several tagged fields and a topic (country
group) classification. We leverage VisWorks TIARA to visualize it.
Fig. 2 gives an overview of the visualization,
where X axis is mapped to time, Y axis gives the document number of specific
time period, and the vertical layers correspond to topics. The keywords in each
layer shows the time-sensitive entitles within each topic. The keyword size
indicates the occurrence count. There is a field navigation panel in the left
to control the category of the keywords shown.
Figure 2. Overview of the data with TIARA, all the keyword fields are
selected.
C.3 Visual
Analytics:
To meet the requirement of the task, we
drill-down to the content of each country group with TIARA. Here we only show
how we analyze the arms dealing info in
We select only the
In the activity sub-trend, we clearly find
the “call-send-transfer(money)-meet” theme changes. So we
hypothesis that the local Venezuela buyer first call to discuss with the
dealer, then send something (maybe the arms list), transfer the money and
finally arrange to meet.
We further locate some key persons in the
arms dealing of
To help summarize the situation by the end of the
period, we drill-in to this most recent time (from Dec. 2008 to latest), and
click on the layer to let TIARA extract the most important sentences in the
snippets, as shown in Fig. 5. The sentences are ranked by synthesizing weight
of entities (people/location/activity/time) within each. The top five sentences
are highlighted. From these, we conclude that both jhon (the intermediate) and
Vwhombre (probably the local buyer) will meet jt (the arms dealer) in UAE on
late April, 2009.
Figure 3. Detailed view of
Figure 4. Drill into the key people “jhon”, the snippets
containing “jhon” and its alias “jg” are retrieved and
listed in the right panel, sorted by time.
Figure 5. Select the most recent info of this topic and click on the
layer to show sentence summarizations in the right panel.
MC1.2: Illustrate the associations among the players in the arms dealing through a social network. If there are linkages among countries, please highlight these as well in the social network. Our analysts are interested in seeing different views of the social network that might help them in counterintelligence activities (people, places, activities, communication patterns that are key to the network).
For this task, we start
from the player social network in Fig. 1. The people name extraction method is
the same with task 1.1. In this figure, people icon size is mapped to people
occurrence count in snippets. while edge indicates the co-occurrence of people
in the same snippet. Figure 1 shows that the player network is composed of
several connected components, where “Nicolai” and
“Mikhail” connect the largest component together.
Figure 2 places the
national flags over people to indicate the country, while inter-country connections
are highlighted in orange. Still, “Nicolai”,“Mikhail”,
as well as “Arkadi” and “Boonmee” this time, behave as
international players.
Figure 3 further
maps the graph betweenness centrality score to the icon size, then
“Nicolai”,“Mikhail”,“Arkadi”,“Saleh”,“Nahid”
is noticed as bridges in the network.
Figure 4 introduces
additional locations (such as
Figure 5 maps the
country group (topic) category (extracted in task 1.1) over each people, and
highlight the 1-hop closure of “
Figure 1. Player network with node size mapped to occurrence count.
Figure 2. Player network highlighting country info and inter-country
connections.
Figure 3. Player network with node size mapped to graph betweenness
centrality.
Figure 4. Player+Location+Organization network, the whole picture is
almost interconnected by the new locations and organizations.
Figure 5. Synthesized network with country group category mapped to
node color.
A. Summarization by Country: (the result by
combining text and network visual analytics)
A.1
Pakistan:
Activities:
u Lashkar-e-Jhangvi is suspected to have
planned a bomb attack in late Feb, 2008.
u Another plan is forming by Lashkar-e-Jhangvi
reminders near
u Bukhari, a suspected top leader of
Lashkar-e-Jhangvi, transfers money out twice, first in Feb.2008 to
Situations:
u Bukhari will go to Burj hotel,
A.2
Activities:
u One cargo plane is seized in
u Boonmee frequently contacts with Nicolai,
Arkadi and Lim.
Situations:
u The arms dealer for
u Boonmee will meet Nicolai Kuryakin in
Hypothesis:
u Arkadi is a
u Boonmee is a local arms dealer in southeast
Asia. He got arms from Arkadi, Nicolai, and sold to
A.3
Activities:
u MFJ (terrorist group in
u Khouri, MFJ sympathizer, Kasem’s
friend, gets a contact (from Russia Army) for ammunition. Kasem and Anka
finally decide to take flight to
Situations:
u Kasem, Khouri and Anka will fly to
A.4
Activities:
u
u November 14, 2008,
u Jhon requests hundreds of arms from jtomski
on October, 2008.
u vwhombre requests arms from jtomski
Situations:
u Jhon will have a meeting with jtomski on
April 23, 2008 at Arab Sail.
u Vwhombre will meet jtomski on April 22, 2009
at hotel in
Hypothesis:
u
u Jhon is probably a local arms dealer in
A.5
Activities:
u Baltasar, a suspected leader of a
u They connect to the arms dealer through a
Situations:
u Celik, Hakan and Kaya will travel to
u Baltasar, Adad and Ashur will go to
Hypothesis:
u
A.6
Activities:
u Funsho Kapolalum (probably DR.
GEORGE’s partner), arranges with Mikhail to transfer money out
u A list is sent under by Dr. George,
suspected to be the required arms list.
u Mikhail calls
Situations:
u They will meet on 15 April, 2009 in
A.7
Activities:
u During the MP training, September 2008, some
arms have been moved and lost.
u Arms are found in house of Thabiti Otieno, October
2008. He is arrested together with MP officers.
u Thabiti Otieno and his wife Nahid Owiti are
charged for ammunition possessing, but later acquitted.
u A cargo ship carrying arms is captured by
pirates in October, 2008. The owner pays ransom on March 2009.
Situations:
u
Hypothesis:
u Thabiti and Nahid could be the local contact
of arms dealing in African (
A.8
Activities:
u Weapons are seized in Dafa. Saleh Ahmed,
leader of the smuggling, fled to
u Saleh is supplying the rebels of
u Saleh arranges to buy ammunition from
Mikhail.
Situations:
u Saleh and Mikhail (also Nicolai) will meet
at the Burj,
Hypothesis:
u Saleh is a local arms dealer in Middle East
(